Concepts of Programming Languages

On the importance of building bridges

Old Joe was a good worker, but he had no sense. You wouldn't leave him out on a bridge by himself; you never knew what he'd do. Just last year, I had Joe bolting gussets on the superstructure of the bridge over Data Stream – you know, that new bridge on the north frontage road of Information Highway? So there he was, short a bolt and no one to toss him one.

Never believe what he did: Joe climbed down from the steel, got into our old Northwest crane – he used to be an operating engineer, Joe did – and swung the boom out over the bolt bucket. He must have spent 45 minutes trying to attach that one bolt to the hook so he could lift the thing up Finally the super came back from lunch and told him to use his pocket.

"Never thought of that," Joe told him. "It fits real nice, too."

Still, that wasn't so strange as Joe's last job. We were fitting some needle beams under the courthouse for that foundation repair job. Now, these things weigh in at 500 lbs a yard and they're about 35 ft each. So here's Joe trying to manhandle the beam through the opening and telling the foreman that the super doesn't want him to use the crane! Joe's got a memory, but no sense.

He does have disablity retirement, though, and not doing too bad except when the pain acts up.

The joys of assembler language

Poor Joe. Not that he had no sense at all, but he had no sense of scale. A crane is not appropriate to lift a single bolt, and trying to use it just causes problems. But it is appropriate to use heavy equipment to move heavy objects. As a former operating engineer, Joe should have been able to see where the equipment would be helpful.

Now let us return to electronic data processing. Data is processed as electronic pulses or waveforms within the circuitry. If you have a need to deal with this raw level, you need to deal with physical circuits, the motion of electrons, and the propagation of field effects. But of course we, as data processors, do not have this need. We usually ignore the electronic niceties. (Do you really care, on a day by day basis, that the magnetic core memory was replaced by metal-oxide semi-conductors?) Why not? Because the circuitry presents the data to us in a consistent but abstract form we call a "byte". Electronically, a byte is a transient effect of the processing – but it is a well defined and stable effect which can be treated as if it were an object in itself.

This brings us to the level of the "machine". The computing machine is a system of electronics and abstract conventions which allow us to work conveniently with data at the level of the byte. (The same point applies to the computer "word", but the byte is the more universal concept.) If you have a need to deal with data at the level of the byte, working at the machine level is both easiest and most efficient. Why? Because the machine is specifically "sized" to work with the byte. How? The machine language includes single instructions to perform transformations of byte data. Working in the machine language, these operations can be executed directly, without overhead to manage the system or to discard extraneous electronic effects.

Machine language is the ideal language for processing byte-sized data. But it is difficult for non-machines to process because the machine language itself is coded in bytes. We therefore introduce mnemonics, codes which represent and substitute for the machine codes but are easier for humans to remember. Violá! The assembler language. All the benefits of the machine language – since it is, after all, just another way to write the machine language – but missing some of the defects from the human point of view. (The price we pay for this is the extra step of translating from the assembler code to the actual machine language.) This change is so valuable that assemblers were enhanced to accept symbolic names (that is, symbolic addressing) and symbolic abbreviations (macros). This resulted in a language which is much more powerful than most people realize, one that can be coded quickly and which can make use of all the structured progamming techniques.

The evolution into abstraction

So much for the proper way to pick up bolts. If the assembler is so wonderful, why do we use other languages? Why didn't Joe put the beam in his pocket?

The truth is that most data is not processed at the level of the byte. A major advance came with the development of FORTRAN, which deals with values. True, the values are actually stored as bytes (or are they stored as electronic effects?) and true also that FORTRAN statements are translated into machine language for processing. Yet if you are dealing with data as value, FORTRAN is more efficient to use because its instructions are "sized" to values instead of bytes.

Single values. But most data is not recorded or processed as single values. Most data is stored in more complex, and more abstract, structures – records, files, and now databases of many files. If you normally deal with data at these levels, a different language will be easier to use.

COBOL was invented to deal with data which is organized in a structured way: values and groups of values stored in records in files. For processing data as organized structures, COBOL has never been surpassed – as shown by its wide use and longevity. (Other languages also exist for similar purposes but have failed to achieve as much. RPG, perhaps the closest of COBOL's competitors, does not allow as much structure in the description of the data records, for example.)

Notice how the "higher" languages are dealing with the more abstract notions of data. The actual storage of the data doesn't change, but our understanding of the data does does change. We look at more of the data at one time and we see more organization in the data. We even add new data to describe the data, beginning with indexes to data files. All of this is transparent to us; we simply stop looking at the individual bytes and instead see complete records and files. As we do this, we need to change our tools to ones which are built to handle, not bytes, but records and files. This is why application programs are not written in assembler.

Today, we are trying to learn to see not just single files but the full range of related data as one orgnaized structure. We have not learned the best way to do this, as demonstrated by the multiplicity of database management systems and database languages vying for our attention and our dollars. The current issue of MIS Quarterly also has an interesting study concerning how a better (and more abstract) model of the database gives rise to a better query language.

Potholes on the road to Nirvana

Surely this means that we are slowly (oh, much too slowly) approaching the perfect language! But no, and two issues will serve to explain why there is no perfect language. The progress illustrated so far is real and important, but it assumes too much about the data we deal with to say that we on the one road to perfection.

First of all, remember FORTRAN? It is alive and well among numerical analysts and certain engineering specialties. Why? Because these people do not deal with highly structured databases. The "advances" just described do not apply to them; indeed, the broader view is nothing but overhead.

Second, consider text processing – specifically language translation. Text, like all data, is stored as bytes. But the next levels of abstraction are not fields, records, and files, but words, sentences, and paragraphs. (What's worse, the hierarchy is constantly mutating, what with clauses, sections, and chapters.) Database languages are useless; field-based languages like COBOL can be used only if the data is constantly moved back and forth between text and field formats. Absent a specific language and data model to deal with text (and I've not seen any) it would be more efficent to return to the assembler level to process text than to use a "higher" language with inappropriate assumptions.


Other issues:

The "product" is usually a continuing service, often with a long life, and not the specific program used to render the service. We don't yet have a language which expresses the on-going service provided by people and machines.

This discussion deals only with well-defined, objective data. Most human language is designed as much to inspire thoughts and feelings as to convey information – perhaps more. So "natural" language is inherently ambiguous, difficult to process, and dependent on the context.


For Mark Delforge, March 18, 1994.
Marked up October, 2002, and February, 2010.